A Semi-Discrete Matrix Decomposition for Latent Semantic Indexing in Information Retrieval
نویسندگان
چکیده
The vast amount of textual information available today is useless unless it can be e ectively and e ciently searched. In information retrieval, we wish to match queries with relevant documents. Documents can be represented by the terms that appear within them, but literal matching of terms does not necessarily retrieve all relevant documents. Latent Semantic Indexing represents documents by approximations and tends to cluster documents on similar topics even if their term pro les are somewhat different. This approximate representation is usually accomplished using a low-rank singular value decomposition (SVD) approximation. In this paper, we use an alternate decomposition, the semi-discrete decomposition (SDD). In our tests, for equal query times, the SDD does as well as the SVD and uses less than one-tenth the storage. Additionally, we show how to update the SDD for a dynamically changing document collection.
منابع مشابه
Information Retrieval System in Bahasa Indonesia Using Latent Semantic Indexing and Semi-Discrete Matrix Decomposition
The focus of this paper is exploring the use of Latent Semantic Indexing (LSI) and Semi-Discrete Matrix Decomposition (SDD) in Bahasa Indonesia Information Retrieval System. The method is to take advantage of implicit higher-order structure in association of terms with document (" semantic structure ") in order to improve the detection of relevant document on the basis of terms found in queries...
متن کاملLatent Semantic Indexing via a Semi-Discrete Matrix Decomposition
With the electronic storage of documents comes the possibility of building search engines that can automatically choose documents relevant to a given set of topics. In information retrieval, we wish to match queries with relevant documents. Documents can be represented by the terms that appear within them, but literal matching of terms does not necessarily retrieve all relevant documents. There...
متن کاملClustered SVD strategies in latent semantic indexing
The text retrieval method using Latent Semantic Indexing (LSI) technique with truncated Singular Value Decomposition (SVD) has been intensively studied in recent years. The SVD reduces the noise contained in the original representation of the term-document matrix and improves the information retrieval accuracy. Recent studies indicate that SVD is mostly useful for small homogeneous data collect...
متن کاملClustered SVD strategies in latent semantic indexing q
The text retrieval method using latent semantic indexing (LSI) technique with truncated singular value decomposition (SVD) has been intensively studied in recent years. The SVD reduces the noise contained in the original representation of the term–document matrix and improves the information retrieval accuracy. Recent studies indicate that SVD is mostly useful for small homogeneous data collect...
متن کاملMatrices with Low-Rank-Plus-Shift Structure: Partial SVD and Latent Semantic Indexing
We present a detailed analysis of matrices satisfying the so-called low-mnk-plus-shift property in connection with the computation of their partial singular value decomposition. The application we have in mind is Latent Semantic Indexing for information retrieval where the termdocument matrices generated from a text corpus approximately satisfy this property. The analysis is motivated by develo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997